This paper studies the segmentation and clustering of speaker speech. In order to improve the accuracy of speech endpoint\ndetection, the traditional double-threshold short-time average zero-crossing rate is replaced by a better spectrum centroid feature,\nand the local maxima of the statistical feature sequence histogram are used to select the threshold, and a new speech endpoint\ndetection algorithm is proposed. Compared with the traditional double-threshold algorithm, it effectively improves the detection\naccuracy and antinoise in low SNR. The k-means algorithm of conventional clustering needs to give the number of clusters in\nadvance and is greatly affected by the choice of initial cluster centers. At the same time, the self-organizing neural network\nalgorithm converges slowly and cannot provide accurate clustering information. An improved k-means speaker clustering\nalgorithm based on self-organizing neural network is proposed. Thenumber of clusters is predicted by the winning situation of the\ncompetitive neurons in the trained network, and the weights of the neurons are used as the initial cluster centers of the k-means\nalgorithm. The experimental results of multiperson mixed speech segmentation show that the proposed algorithm can effectively\nimprove the accuracy of speech clustering and make up for the shortcomings of the k-means algorithm and self-organizing neural\nnetwork algorithm.
Loading....